Dataset statistics
| Number of variables | 12 |
|---|---|
| Number of observations | 714 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 67.1 KiB |
| Average record size in memory | 96.2 B |
Variable types
| Numeric | 6 |
|---|---|
| Categorical | 6 |
Name has a high cardinality: 714 distinct values | High cardinality |
Ticket has a high cardinality: 542 distinct values | High cardinality |
df_index is highly correlated with PassengerId | High correlation |
PassengerId is highly correlated with df_index | High correlation |
Pclass is highly correlated with Fare | High correlation |
Fare is highly correlated with Pclass | High correlation |
df_index is highly correlated with PassengerId | High correlation |
PassengerId is highly correlated with df_index | High correlation |
Pclass is highly correlated with Fare | High correlation |
Fare is highly correlated with Pclass | High correlation |
df_index is highly correlated with PassengerId | High correlation |
PassengerId is highly correlated with df_index | High correlation |
Pclass is highly correlated with Fare | High correlation |
Fare is highly correlated with Pclass | High correlation |
Sex is highly correlated with Survived | High correlation |
Survived is highly correlated with Sex | High correlation |
df_index is highly correlated with PassengerId | High correlation |
PassengerId is highly correlated with df_index | High correlation |
Survived is highly correlated with Sex | High correlation |
Pclass is highly correlated with Fare and 1 other fields | High correlation |
Sex is highly correlated with Survived | High correlation |
Fare is highly correlated with Pclass | High correlation |
Embark is highly correlated with Pclass | High correlation |
df_index is uniformly distributed | Uniform |
PassengerId is uniformly distributed | Uniform |
Name is uniformly distributed | Uniform |
Ticket is uniformly distributed | Uniform |
df_index has unique values | Unique |
PassengerId has unique values | Unique |
Name has unique values | Unique |
SibSp has 471 (66.0%) zeros | Zeros |
Parch has 521 (73.0%) zeros | Zeros |
Reproduction
| Analysis started | 2022-10-09 08:01:16.724608 |
|---|---|
| Analysis finished | 2022-10-09 08:01:23.340626 |
| Duration | 6.62 seconds |
| Software version | pandas-profiling v3.2.0 |
| Download configuration | config.json |
df_index
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONUNIFORMUNIQUE| Distinct | 714 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 447.5826331 |
| Minimum | 0 |
|---|---|
| Maximum | 890 |
| Zeros | 1 |
| Zeros (%) | 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 5.7 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 49.65 |
| Q1 | 221.25 |
| median | 444 |
| Q3 | 676.75 |
| 95-th percentile | 848.7 |
| Maximum | 890 |
| Range | 890 |
| Interquartile range (IQR) | 455.5 |
Descriptive statistics
| Standard deviation | 259.1195244 |
|---|---|
| Coefficient of variation (CV) | 0.578931141 |
| Kurtosis | -1.224109035 |
| Mean | 447.5826331 |
| Median Absolute Deviation (MAD) | 227.5 |
| Skewness | -0.0006094557038 |
| Sum | 319574 |
| Variance | 67142.92794 |
| Monotonicity | Strictly increasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 1 | 0.1% |
| 621 | 1 | 0.1% |
| 594 | 1 | 0.1% |
| 595 | 1 | 0.1% |
| 597 | 1 | 0.1% |
| 599 | 1 | 0.1% |
| 600 | 1 | 0.1% |
| 603 | 1 | 0.1% |
| 604 | 1 | 0.1% |
| 605 | 1 | 0.1% |
| Other values (704) | 704 |
| Value | Count | Frequency (%) |
| 0 | 1 | |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 9 | 1 | |
| 10 | 1 |
| Value | Count | Frequency (%) |
| 890 | 1 | |
| 889 | 1 | |
| 887 | 1 | |
| 886 | 1 | |
| 885 | 1 | |
| 884 | 1 | |
| 883 | 1 | |
| 882 | 1 | |
| 881 | 1 | |
| 880 | 1 |
PassengerId
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONUNIFORMUNIQUE| Distinct | 714 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 448.5826331 |
| Minimum | 1 |
|---|---|
| Maximum | 891 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 5.7 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 50.65 |
| Q1 | 222.25 |
| median | 445 |
| Q3 | 677.75 |
| 95-th percentile | 849.7 |
| Maximum | 891 |
| Range | 890 |
| Interquartile range (IQR) | 455.5 |
Descriptive statistics
| Standard deviation | 259.1195244 |
|---|---|
| Coefficient of variation (CV) | 0.5776405624 |
| Kurtosis | -1.224109035 |
| Mean | 448.5826331 |
| Median Absolute Deviation (MAD) | 227.5 |
| Skewness | -0.0006094557038 |
| Sum | 320288 |
| Variance | 67142.92794 |
| Monotonicity | Strictly increasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1 | 1 | 0.1% |
| 622 | 1 | 0.1% |
| 595 | 1 | 0.1% |
| 596 | 1 | 0.1% |
| 598 | 1 | 0.1% |
| 600 | 1 | 0.1% |
| 601 | 1 | 0.1% |
| 604 | 1 | 0.1% |
| 605 | 1 | 0.1% |
| 606 | 1 | 0.1% |
| Other values (704) | 704 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 9 | 1 | |
| 10 | 1 | |
| 11 | 1 |
| Value | Count | Frequency (%) |
| 891 | 1 | |
| 890 | 1 | |
| 888 | 1 | |
| 887 | 1 | |
| 886 | 1 | |
| 885 | 1 | |
| 884 | 1 | |
| 883 | 1 | |
| 882 | 1 | |
| 881 | 1 |
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 5.7 KiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 714 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 1 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 424 | |
| 1 | 290 |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| 0 | 424 | |
| 1 | 290 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 424 | |
| 1 | 290 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 714 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 424 | |
| 1 | 290 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 714 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 424 | |
| 1 | 290 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 714 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 424 | |
| 1 | 290 |
| Distinct | 3 |
|---|---|
| Distinct (%) | 0.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 5.7 KiB |
| 3 | |
|---|---|
| 1 | |
| 2 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 714 |
|---|---|
| Distinct characters | 3 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 3 |
|---|---|
| 2nd row | 1 |
| 3rd row | 3 |
| 4th row | 1 |
| 5th row | 3 |
Common Values
| Value | Count | Frequency (%) |
| 3 | 355 | |
| 1 | 186 | |
| 2 | 173 |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| 3 | 355 | |
| 1 | 186 | |
| 2 | 173 |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 355 | |
| 1 | 186 | |
| 2 | 173 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 714 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 3 | 355 | |
| 1 | 186 | |
| 2 | 173 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 714 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 3 | 355 | |
| 1 | 186 | |
| 2 | 173 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 714 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 3 | 355 | |
| 1 | 186 | |
| 2 | 173 |
| Distinct | 714 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 5.7 KiB |
| Braund, Mr. Owen Harris | 1 |
|---|---|
| Kimball, Mr. Edwin Nelson Jr | 1 |
| Chapman, Mr. John Henry | 1 |
| Van Impe, Mr. Jean Baptiste | 1 |
| Johnson, Mr. Alfred | 1 |
| Other values (709) |
Length
| Max length | 82 |
|---|---|
| Median length | 52 |
| Mean length | 27.69327731 |
| Min length | 13 |
Characters and Unicode
| Total characters | 19773 |
|---|---|
| Distinct characters | 59 |
| Distinct categories | 7 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 714 ? |
|---|---|
| Unique (%) | 100.0% |
Sample
| 1st row | Braund, Mr. Owen Harris |
|---|---|
| 2nd row | Cumings, Mrs. John Bradley (Florence Briggs Thayer) |
| 3rd row | Heikkinen, Miss. Laina |
| 4th row | Futrelle, Mrs. Jacques Heath (Lily May Peel) |
| 5th row | Allen, Mr. William Henry |
Common Values
| Value | Count | Frequency (%) |
| Braund, Mr. Owen Harris | 1 | 0.1% |
| Kimball, Mr. Edwin Nelson Jr | 1 | 0.1% |
| Chapman, Mr. John Henry | 1 | 0.1% |
| Van Impe, Mr. Jean Baptiste | 1 | 0.1% |
| Johnson, Mr. Alfred | 1 | 0.1% |
| Duff Gordon, Sir. Cosmo Edmund ("Mr Morgan") | 1 | 0.1% |
| Jacobsohn, Mrs. Sidney Samuel (Amy Frances Christy) | 1 | 0.1% |
| Torber, Mr. Ernst William | 1 | 0.1% |
| Homer, Mr. Harry ("Mr E Haven") | 1 | 0.1% |
| Lindell, Mr. Edvard Bengtsson | 1 | 0.1% |
| Other values (704) | 704 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| mr | 402 | 13.5% |
| miss | 146 | 4.9% |
| mrs | 112 | 3.8% |
| william | 55 | 1.9% |
| john | 36 | 1.2% |
| master | 36 | 1.2% |
| henry | 28 | 0.9% |
| charles | 19 | 0.6% |
| james | 18 | 0.6% |
| george | 18 | 0.6% |
| Other values (1297) | 2097 |
Most occurring characters
| Value | Count | Frequency (%) |
| 2255 | 11.4% | |
| r | 1591 | 8.0% |
| e | 1390 | 7.0% |
| a | 1375 | 7.0% |
| i | 1113 | 5.6% |
| s | 1095 | 5.5% |
| n | 1090 | 5.5% |
| l | 914 | 4.6% |
| M | 884 | 4.5% |
| o | 818 | 4.1% |
| Other values (49) | 7248 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 12785 | |
| Uppercase Letter | 2972 | 15.0% |
| Space Separator | 2255 | 11.4% |
| Other Punctuation | 1500 | 7.6% |
| Open Punctuation | 125 | 0.6% |
| Close Punctuation | 125 | 0.6% |
| Dash Punctuation | 11 | 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| r | 1591 | |
| e | 1390 | |
| a | 1375 | |
| i | 1113 | |
| s | 1095 | |
| n | 1090 | |
| l | 914 | 7.1% |
| o | 818 | 6.4% |
| t | 557 | 4.4% |
| d | 423 | 3.3% |
| Other values (16) | 2419 |
Uppercase Letter
| Value | Count | Frequency (%) |
| M | 884 | |
| A | 222 | 7.5% |
| J | 176 | 5.9% |
| H | 169 | 5.7% |
| E | 149 | 5.0% |
| C | 147 | 4.9% |
| S | 145 | 4.9% |
| W | 120 | 4.0% |
| B | 119 | 4.0% |
| L | 110 | 3.7% |
| Other values (15) | 731 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 715 | |
| , | 714 | |
| " | 70 | 4.7% |
| / | 1 | 0.1% |
Space Separator
| Value | Count | Frequency (%) |
| 2255 |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 125 |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 125 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 11 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 15757 | |
| Common | 4016 | 20.3% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| r | 1591 | 10.1% |
| e | 1390 | 8.8% |
| a | 1375 | 8.7% |
| i | 1113 | 7.1% |
| s | 1095 | 6.9% |
| n | 1090 | 6.9% |
| l | 914 | 5.8% |
| M | 884 | 5.6% |
| o | 818 | 5.2% |
| t | 557 | 3.5% |
| Other values (41) | 4930 |
Common
| Value | Count | Frequency (%) |
| 2255 | ||
| . | 715 | 17.8% |
| , | 714 | 17.8% |
| ( | 125 | 3.1% |
| ) | 125 | 3.1% |
| " | 70 | 1.7% |
| - | 11 | 0.3% |
| / | 1 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 19773 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 2255 | 11.4% | |
| r | 1591 | 8.0% |
| e | 1390 | 7.0% |
| a | 1375 | 7.0% |
| i | 1113 | 5.6% |
| s | 1095 | 5.5% |
| n | 1090 | 5.5% |
| l | 914 | 4.6% |
| M | 884 | 4.5% |
| o | 818 | 4.1% |
| Other values (49) | 7248 |
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 5.7 KiB |
| male | |
|---|---|
| female |
Length
| Max length | 6 |
|---|---|
| Median length | 4 |
| Mean length | 4.731092437 |
| Min length | 4 |
Characters and Unicode
| Total characters | 3378 |
|---|---|
| Distinct characters | 5 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | male |
|---|---|
| 2nd row | female |
| 3rd row | female |
| 4th row | female |
| 5th row | male |
Common Values
| Value | Count | Frequency (%) |
| male | 453 | |
| female | 261 |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| male | 453 | |
| female | 261 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 975 | |
| m | 714 | |
| a | 714 | |
| l | 714 | |
| f | 261 | 7.7% |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 3378 |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 975 | |
| m | 714 | |
| a | 714 | |
| l | 714 | |
| f | 261 | 7.7% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 3378 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 975 | |
| m | 714 | |
| a | 714 | |
| l | 714 | |
| f | 261 | 7.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 3378 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 975 | |
| m | 714 | |
| a | 714 | |
| l | 714 | |
| f | 261 | 7.7% |
Age
Real number (ℝ≥0)
| Distinct | 88 |
|---|---|
| Distinct (%) | 12.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 29.69911765 |
| Minimum | 0.42 |
|---|---|
| Maximum | 80 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 5.7 KiB |
Quantile statistics
| Minimum | 0.42 |
|---|---|
| 5-th percentile | 4 |
| Q1 | 20.125 |
| median | 28 |
| Q3 | 38 |
| 95-th percentile | 56 |
| Maximum | 80 |
| Range | 79.58 |
| Interquartile range (IQR) | 17.875 |
Descriptive statistics
| Standard deviation | 14.52649733 |
|---|---|
| Coefficient of variation (CV) | 0.4891221855 |
| Kurtosis | 0.1782741536 |
| Mean | 29.69911765 |
| Median Absolute Deviation (MAD) | 9 |
| Skewness | 0.3891077823 |
| Sum | 21205.17 |
| Variance | 211.0191247 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 24 | 30 | 4.2% |
| 22 | 27 | 3.8% |
| 18 | 26 | 3.6% |
| 19 | 25 | 3.5% |
| 28 | 25 | 3.5% |
| 30 | 25 | 3.5% |
| 21 | 24 | 3.4% |
| 25 | 23 | 3.2% |
| 36 | 22 | 3.1% |
| 29 | 20 | 2.8% |
| Other values (78) | 467 |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.1% |
| 0.67 | 1 | 0.1% |
| 0.75 | 2 | 0.3% |
| 0.83 | 2 | 0.3% |
| 0.92 | 1 | 0.1% |
| 1 | 7 | |
| 2 | 10 | |
| 3 | 6 | |
| 4 | 10 | |
| 5 | 4 | 0.6% |
| Value | Count | Frequency (%) |
| 80 | 1 | 0.1% |
| 74 | 1 | 0.1% |
| 71 | 2 | |
| 70.5 | 1 | 0.1% |
| 70 | 2 | |
| 66 | 1 | 0.1% |
| 65 | 3 | |
| 64 | 2 | |
| 63 | 2 | |
| 62 | 4 |
| Distinct | 6 |
|---|---|
| Distinct (%) | 0.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.512605042 |
| Minimum | 0 |
|---|---|
| Maximum | 5 |
| Zeros | 471 |
| Zeros (%) | 66.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 5.7 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 1 |
| 95-th percentile | 2 |
| Maximum | 5 |
| Range | 5 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 0.9297834541 |
|---|---|
| Coefficient of variation (CV) | 1.813839853 |
| Kurtosis | 7.044950785 |
| Mean | 0.512605042 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 2.519576762 |
| Sum | 366 |
| Variance | 0.8644972716 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=6)
| Value | Count | Frequency (%) |
| 0 | 471 | |
| 1 | 183 | 25.6% |
| 2 | 25 | 3.5% |
| 4 | 18 | 2.5% |
| 3 | 12 | 1.7% |
| 5 | 5 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 471 | |
| 1 | 183 | 25.6% |
| 2 | 25 | 3.5% |
| 3 | 12 | 1.7% |
| 4 | 18 | 2.5% |
| 5 | 5 | 0.7% |
| Value | Count | Frequency (%) |
| 5 | 5 | 0.7% |
| 4 | 18 | 2.5% |
| 3 | 12 | 1.7% |
| 2 | 25 | 3.5% |
| 1 | 183 | 25.6% |
| 0 | 471 |
| Distinct | 7 |
|---|---|
| Distinct (%) | 1.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.431372549 |
| Minimum | 0 |
|---|---|
| Maximum | 6 |
| Zeros | 521 |
| Zeros (%) | 73.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 5.7 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 1 |
| 95-th percentile | 2 |
| Maximum | 6 |
| Range | 6 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 0.8532893658 |
|---|---|
| Coefficient of variation (CV) | 1.978079893 |
| Kurtosis | 8.853125533 |
| Mean | 0.431372549 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 2.618913989 |
| Sum | 308 |
| Variance | 0.7281027418 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=7)
| Value | Count | Frequency (%) |
| 0 | 521 | |
| 1 | 110 | 15.4% |
| 2 | 68 | 9.5% |
| 5 | 5 | 0.7% |
| 3 | 5 | 0.7% |
| 4 | 4 | 0.6% |
| 6 | 1 | 0.1% |
| Value | Count | Frequency (%) |
| 0 | 521 | |
| 1 | 110 | 15.4% |
| 2 | 68 | 9.5% |
| 3 | 5 | 0.7% |
| 4 | 4 | 0.6% |
| 5 | 5 | 0.7% |
| 6 | 1 | 0.1% |
| Value | Count | Frequency (%) |
| 6 | 1 | 0.1% |
| 5 | 5 | 0.7% |
| 4 | 4 | 0.6% |
| 3 | 5 | 0.7% |
| 2 | 68 | 9.5% |
| 1 | 110 | 15.4% |
| 0 | 521 |
| Distinct | 542 |
|---|---|
| Distinct (%) | 75.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 5.7 KiB |
| 347082 | 7 |
|---|---|
| 3101295 | 6 |
| CA 2144 | 6 |
| 347088 | 6 |
| 382652 | 5 |
| Other values (537) |
Length
| Max length | 18 |
|---|---|
| Median length | 17 |
| Mean length | 6.841736695 |
| Min length | 3 |
Characters and Unicode
| Total characters | 4885 |
|---|---|
| Distinct characters | 35 |
| Distinct categories | 5 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 431 ? |
|---|---|
| Unique (%) | 60.4% |
Sample
| 1st row | A/5 21171 |
|---|---|
| 2nd row | PC 17599 |
| 3rd row | STON/O2. 3101282 |
| 4th row | 113803 |
| 5th row | 373450 |
Common Values
| Value | Count | Frequency (%) |
| 347082 | 7 | 1.0% |
| 3101295 | 6 | 0.8% |
| CA 2144 | 6 | 0.8% |
| 347088 | 6 | 0.8% |
| 382652 | 5 | 0.7% |
| S.O.C. 14879 | 5 | 0.7% |
| 113760 | 4 | 0.6% |
| 1601 | 4 | 0.6% |
| 19950 | 4 | 0.6% |
| 347077 | 4 | 0.6% |
| Other values (532) | 663 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| pc | 50 | 5.5% |
| c.a | 26 | 2.8% |
| a/5 | 15 | 1.6% |
| 2 | 12 | 1.3% |
| ston/o | 12 | 1.3% |
| 347082 | 7 | 0.8% |
| sc/paris | 7 | 0.8% |
| ca | 7 | 0.8% |
| soton/o.q | 6 | 0.7% |
| 3101295 | 6 | 0.7% |
| Other values (567) | 768 |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 569 | |
| 1 | 569 | |
| 2 | 472 | |
| 7 | 407 | |
| 4 | 381 | 7.8% |
| 0 | 340 | 7.0% |
| 5 | 324 | 6.6% |
| 6 | 315 | 6.4% |
| 9 | 259 | 5.3% |
| 8 | 234 | 4.8% |
| Other values (25) | 1015 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 3870 | |
| Uppercase Letter | 546 | 11.2% |
| Other Punctuation | 247 | 5.1% |
| Space Separator | 202 | 4.1% |
| Lowercase Letter | 20 | 0.4% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| C | 127 | |
| P | 84 | |
| O | 84 | |
| A | 65 | |
| S | 62 | |
| N | 35 | 6.4% |
| T | 31 | 5.7% |
| W | 13 | 2.4% |
| Q | 10 | 1.8% |
| I | 9 | 1.6% |
| Other values (6) | 26 | 4.8% |
Decimal Number
| Value | Count | Frequency (%) |
| 3 | 569 | |
| 1 | 569 | |
| 2 | 472 | |
| 7 | 407 | |
| 4 | 381 | |
| 0 | 340 | |
| 5 | 324 | |
| 6 | 315 | |
| 9 | 259 | |
| 8 | 234 |
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 5 | |
| s | 5 | |
| r | 4 | |
| i | 4 | |
| l | 1 | 5.0% |
| e | 1 | 5.0% |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 166 | |
| / | 81 |
Space Separator
| Value | Count | Frequency (%) |
| 202 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 4319 | |
| Latin | 566 | 11.6% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| C | 127 | |
| P | 84 | |
| O | 84 | |
| A | 65 | |
| S | 62 | |
| N | 35 | 6.2% |
| T | 31 | 5.5% |
| W | 13 | 2.3% |
| Q | 10 | 1.8% |
| I | 9 | 1.6% |
| Other values (12) | 46 | 8.1% |
Common
| Value | Count | Frequency (%) |
| 3 | 569 | |
| 1 | 569 | |
| 2 | 472 | |
| 7 | 407 | |
| 4 | 381 | |
| 0 | 340 | |
| 5 | 324 | |
| 6 | 315 | |
| 9 | 259 | |
| 8 | 234 | |
| Other values (3) | 449 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 4885 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 3 | 569 | |
| 1 | 569 | |
| 2 | 472 | |
| 7 | 407 | |
| 4 | 381 | 7.8% |
| 0 | 340 | 7.0% |
| 5 | 324 | 6.6% |
| 6 | 315 | 6.4% |
| 9 | 259 | 5.3% |
| 8 | 234 | 4.8% |
| Other values (25) | 1015 |
| Distinct | 220 |
|---|---|
| Distinct (%) | 30.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 34.69451401 |
| Minimum | 0 |
|---|---|
| Maximum | 512.3292 |
| Zeros | 7 |
| Zeros (%) | 1.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 5.7 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 7.225 |
| Q1 | 8.05 |
| median | 15.7417 |
| Q3 | 33.375 |
| 95-th percentile | 120 |
| Maximum | 512.3292 |
| Range | 512.3292 |
| Interquartile range (IQR) | 25.325 |
Descriptive statistics
| Standard deviation | 52.9189295 |
|---|---|
| Coefficient of variation (CV) | 1.52528234 |
| Kurtosis | 30.92424901 |
| Mean | 34.69451401 |
| Median Absolute Deviation (MAD) | 8.2334 |
| Skewness | 4.653630368 |
| Sum | 24771.883 |
| Variance | 2800.4131 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 13 | 41 | 5.7% |
| 26 | 30 | 4.2% |
| 8.05 | 29 | 4.1% |
| 10.5 | 24 | 3.4% |
| 7.8958 | 23 | 3.2% |
| 7.925 | 18 | 2.5% |
| 7.75 | 14 | 2.0% |
| 7.775 | 14 | 2.0% |
| 26.55 | 13 | 1.8% |
| 7.8542 | 13 | 1.8% |
| Other values (210) | 495 |
| Value | Count | Frequency (%) |
| 0 | 7 | |
| 4.0125 | 1 | 0.1% |
| 5 | 1 | 0.1% |
| 6.2375 | 1 | 0.1% |
| 6.4375 | 1 | 0.1% |
| 6.45 | 1 | 0.1% |
| 6.4958 | 2 | 0.3% |
| 6.75 | 2 | 0.3% |
| 6.975 | 2 | 0.3% |
| 7.0458 | 1 | 0.1% |
| Value | Count | Frequency (%) |
| 512.3292 | 3 | |
| 263 | 4 | |
| 262.375 | 2 | |
| 247.5208 | 2 | |
| 227.525 | 3 | |
| 211.5 | 1 | 0.1% |
| 211.3375 | 3 | |
| 164.8667 | 2 | |
| 153.4625 | 3 | |
| 151.55 | 4 |
| Distinct | 3 |
|---|---|
| Distinct (%) | 0.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 5.7 KiB |
| 2 | |
|---|---|
| 0 | |
| 1 | 28 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 714 |
|---|---|
| Distinct characters | 3 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 2 |
|---|---|
| 2nd row | 0 |
| 3rd row | 2 |
| 4th row | 2 |
| 5th row | 2 |
Common Values
| Value | Count | Frequency (%) |
| 2 | 556 | |
| 0 | 130 | 18.2% |
| 1 | 28 | 3.9% |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| 2 | 556 | |
| 0 | 130 | 18.2% |
| 1 | 28 | 3.9% |
Most occurring characters
| Value | Count | Frequency (%) |
| 2 | 556 | |
| 0 | 130 | 18.2% |
| 1 | 28 | 3.9% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 714 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 2 | 556 | |
| 0 | 130 | 18.2% |
| 1 | 28 | 3.9% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 714 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 2 | 556 | |
| 0 | 130 | 18.2% |
| 1 | 28 | 3.9% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 714 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 2 | 556 | |
| 0 | 130 | 18.2% |
| 1 | 28 | 3.9% |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| df_index | PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Embark | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | 2 |
| 1 | 1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Thayer) | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | 0 |
| 2 | 2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | 2 |
| 3 | 3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | 2 |
| 4 | 4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0 | 373450 | 8.0500 | 2 |
| 5 | 6 | 7 | 0 | 1 | McCarthy, Mr. Timothy J | male | 54.0 | 0 | 0 | 17463 | 51.8625 | 2 |
| 6 | 7 | 8 | 0 | 3 | Palsson, Master. Gosta Leonard | male | 2.0 | 3 | 1 | 349909 | 21.0750 | 2 |
| 7 | 8 | 9 | 1 | 3 | Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) | female | 27.0 | 0 | 2 | 347742 | 11.1333 | 2 |
| 8 | 9 | 10 | 1 | 2 | Nasser, Mrs. Nicholas (Adele Achem) | female | 14.0 | 1 | 0 | 237736 | 30.0708 | 0 |
| 9 | 10 | 11 | 1 | 3 | Sandstrom, Miss. Marguerite Rut | female | 4.0 | 1 | 1 | PP 9549 | 16.7000 | 2 |
Last rows
| df_index | PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Embark | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 704 | 880 | 881 | 1 | 2 | Shelley, Mrs. William (Imanita Parrish Hall) | female | 25.0 | 0 | 1 | 230433 | 26.0000 | 2 |
| 705 | 881 | 882 | 0 | 3 | Markun, Mr. Johann | male | 33.0 | 0 | 0 | 349257 | 7.8958 | 2 |
| 706 | 882 | 883 | 0 | 3 | Dahlberg, Miss. Gerda Ulrika | female | 22.0 | 0 | 0 | 7552 | 10.5167 | 2 |
| 707 | 883 | 884 | 0 | 2 | Banfield, Mr. Frederick James | male | 28.0 | 0 | 0 | C.A./SOTON 34068 | 10.5000 | 2 |
| 708 | 884 | 885 | 0 | 3 | Sutehall, Mr. Henry Jr | male | 25.0 | 0 | 0 | SOTON/OQ 392076 | 7.0500 | 2 |
| 709 | 885 | 886 | 0 | 3 | Rice, Mrs. William (Margaret Norton) | female | 39.0 | 0 | 5 | 382652 | 29.1250 | 1 |
| 710 | 886 | 887 | 0 | 2 | Montvila, Rev. Juozas | male | 27.0 | 0 | 0 | 211536 | 13.0000 | 2 |
| 711 | 887 | 888 | 1 | 1 | Graham, Miss. Margaret Edith | female | 19.0 | 0 | 0 | 112053 | 30.0000 | 2 |
| 712 | 889 | 890 | 1 | 1 | Behr, Mr. Karl Howell | male | 26.0 | 0 | 0 | 111369 | 30.0000 | 0 |
| 713 | 890 | 891 | 0 | 3 | Dooley, Mr. Patrick | male | 32.0 | 0 | 0 | 370376 | 7.7500 | 1 |